251 research outputs found

    Low-cost Interference Mitigation and Relay Processing for Cooperative DS-CDMA Systems

    Get PDF
    In wireless communications, propagation aspects such as fading, shadowing and path loss are the major constraints that seriously limit the overall performance of systems. Indeed, severe fading has a detrimental effect on the received signals and can lead to a degradation of the transmission of information and the reliability of the network. In this case, diversity techniques are introduced in order to mitigate fading. Among various kinds of diversity techniques, cooperative diversity with relaying nodes is a modern technique that has been widely considered in recent years as an effective tool to deal with this problem. Several cooperative protocols have been proposed in the literature, and among the most effective ones are Amplify-and-Forward (AF) and Decode-and-Forward (DF). Cooperative diversity can be combined with direct sequence code division multiple access (DS-CDMA) systems to further enhance the information security. However, due to the multiple access interference (MAI) that arises from nonorthogonal received waveforms in the DS-CDMA systems, the system performance may easily be affected. To deal with this issue, novel multiuser detection (MUD) technique is introduced as a useful relay processing strategy for the uplink of cooperative DS-CDMA systems. Apart from that, distributed space-time coding (DSTC) is another effective approach that can be combined with cooperative diversity to further improve the transmission performance. Moreover, in order to increase the throughput of the cooperative DS-CDMA network, physical-layer network coding (PNC) scheme is then adopted together with the cooperative DS-CDMA network. Clearly, better performance gain and lower power consumption can be obtained when appropriate relaying strategies are applied

    Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers

    Full text link
    Transformers have achieved great success in machine learning applications. Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root Mean Square Normalization (RMSNorm), play a critical role in accelerating and stabilizing the training of Transformers. While LayerNorm recenters and rescales input vectors, RMSNorm only rescales the vectors by their RMS value. Despite being more computationally efficient, RMSNorm may compromise the representation ability of Transformers. There is currently no consensus regarding the preferred normalization technique, as some models employ LayerNorm while others utilize RMSNorm, especially in recent large language models. It is challenging to convert Transformers with one normalization to the other type. While there is an ongoing disagreement between the two normalization types, we propose a solution to unify two mainstream Transformer architectures, Pre-LN and Pre-RMSNorm Transformers. By removing the inherent redundant mean information in the main branch of Pre-LN Transformers, we can reduce LayerNorm to RMSNorm, achieving higher efficiency. We further propose the Compressed RMSNorm (CRMSNorm) and Pre-CRMSNorm Transformer based on a lossless compression of the zero-mean vectors. We formally establish the equivalence of Pre-LN, Pre-RMSNorm, and Pre-CRMSNorm Transformer variants in both training and inference. It implies that Pre-LN Transformers can be substituted with Pre-(C)RMSNorm counterparts at almost no cost, offering the same arithmetic functionality along with free efficiency improvement. Experiments demonstrate that we can reduce the training and inference time of Pre-LN Transformers by up to 10%.Comment: 15 pages, 5 tables, code available at https://github.com/ZixuanJiang/pre-rmsnorm-transforme

    Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

    Full text link
    In this work, we propose FastCoT, a model-agnostic framework based on parallel decoding without any further training of an auxiliary model or modification to the LLM itself. FastCoT uses a size-varying context window whose size changes with position to conduct parallel decoding and auto-regressive decoding simultaneously, thus fully utilizing GPU computation resources. In FastCoT, the parallel decoding part provides the LLM with a quick glance of the future composed of approximate tokens, which could lead to faster answers compared to regular autoregressive decoding used by causal transformers. We also provide an implementation of parallel decoding within LLM, which supports KV-cache generation and batch processing. Through extensive experiments, we demonstrate that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach. Additionally, we show that the context window size exhibits considerable robustness for different tasks

    Robust inference with GhostKnockoffs in genome-wide association studies

    Full text link
    Genome-wide association studies (GWASs) have been extensively adopted to depict the underlying genetic architecture of complex diseases. Motivated by GWASs' limitations in identifying small effect loci to understand complex traits' polygenicity and fine-mapping putative causal variants from proxy ones, we propose a knockoff-based method which only requires summary statistics from GWASs and demonstrate its validity in the presence of relatedness. We show that GhostKnockoffs inference is robust to its input Z-scores as long as they are from valid marginal association tests and their correlations are consistent with the correlations among the corresponding genetic variants. The property generalizes GhostKnockoffs to other GWASs settings, such as the meta-analysis of multiple overlapping studies and studies based on association test statistics deviated from score tests. We demonstrate GhostKnockoffs' performance using empirical simulation and a meta-analysis of nine European ancestral genome-wide association studies and whole exome/genome sequencing studies. Both results demonstrate that GhostKnockoffs identify more putative causal variants with weak genotype-phenotype associations that are missed by conventional GWASs

    Second-order group knockoffs with applications to GWAS

    Full text link
    Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct "group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. The described algorithms are implemented in an open-source Julia package Knockoffs.jl, for which both R and Python wrappers are available.Comment: 46 pages, 10 figures, 2 tables, 3 algorithm

    A compact butterfly-style silicon photonic-electronic neural chip for hardware-efficient deep learning

    Full text link
    The optical neural network (ONN) is a promising hardware platform for next-generation neurocomputing due to its high parallelism, low latency, and low energy consumption. Previous ONN architectures are mainly designed for general matrix multiplication (GEMM), leading to unnecessarily large area cost and high control complexity. Here, we move beyond classical GEMM-based ONNs and propose an optical subspace neural network (OSNN) architecture, which trades the universality of weight representation for lower optical component usage, area cost, and energy consumption. We devise a butterfly-style photonic-electronic neural chip to implement our OSNN with up to 7x fewer trainable optical components compared to GEMM-based ONNs. Additionally, a hardware-aware training framework is provided to minimize the required device programming precision, lessen the chip area, and boost the noise robustness. We experimentally demonstrate the utility of our neural chip in practical image recognition tasks, showing that a measured accuracy of 94.16% can be achieved in hand-written digit recognition tasks with 3-bit weight programming precision.Comment: 17 pages,5 figure
    • …
    corecore